Deep Reinforcement Learning for Visual Object Tracking in Videos

نویسندگان

Da Zhang

Hamid Maei

Xin Wang

Yuan-Fang Wang

چکیده

Convolutional neural network (CNN) models have achieved tremendous success in many visual detection and recognition tasks. Unfortunately, visual tracking, a fundamental computer vision problem, is not handled well using the existing CNN models, because most object trackers implemented with CNN do not effectively leverage temporal and contextual information among consecutive frames. Recurrent neural network (RNN) models, on the other hand, are often used to process text and voice data due to their ability to learn intrinsic representations of sequential and temporal data. Here, we propose a novel neural network tracking model that is capable of integrating information over time and tracking a selected target in video. It comprises three components: a CNN extracting best tracking features in each video frame, an RNN constructing video memory state, and a reinforcement learning (RL) agent making target location decisions. The tracking problem is formulated as a decision-making process, and our model can be trained with RL algorithms to learn good tracking policies that pay attention to continuous, inter-frame correlation and maximize tracking performance in the long run. We compare our model with an existing neural-network based tracking method and show that the proposed tracking approach works well in various scenarios by performing rigorous validation experiments on artificial video sequences with ground truth. To the best of our knowledge, our tracker is the first neural-network tracker that combines convolutional and recurrent networks with RL algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Visual Tracking using Learning Histogram of Oriented Gradients by SVM on Mobile Robot

The intelligence of a mobile robot is highly dependent on its vision. The main objective of an intelligent mobile robot is in its ability to the online image processing, object detection, and especially visual tracking which is a complex task in stochastic environments. Tracking algorithms suffer from sequence challenges such as illumination variation, occlusion, and background clutter, so an a...

متن کامل

Tracking of Humans in Video Stream Using LSTM Recurrent Neural Network

In this master thesis, the problem of tracking humans in video streams by using Deep Learning is examined. We use spatially supervised recurrent convolutional neural networks for visual human tracking. In this method, the recurrent convolutional network uses both the history of locations and the visual features from the deep neural networks. This method is used for tracking, based on the detect...

متن کامل

Learning Visual Servoing with Deep Features and Fitted Q-Iteration

Visual servoing involves choosing actions that move a robot in response to observations from a camera, in order to reach a goal configuration in the world. Standard visual servoing approaches typically rely on manually designed features and analytical dynamics models, which limits their generalization capability and often requires extensive application-specific feature and model engineering. In...

متن کامل

Long-Term Visual Object Tracking Benchmark

In this paper, we propose a new long video dataset (called Track Long and Prosper TLP) and benchmark for visual object tracking. The dataset consists of 50 videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to...

متن کامل

MBestStruck: M-Best diverse sampling for structured tracker

We approach the problem of model-free visual tracking of objects in videos. Modelfree tracking has its state-of-the-art in a class of methods called tracking-by-detection, as shown in recent benchmarks. Some top-performing methods use deep neural networks (i.e., convnets) to solve the learning-based steps of the tracking algorithm (e.g., bounding-box prediction and evaluation). Despite improvin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1701.08936 شماره

صفحات -

تاریخ انتشار 2017

Deep Reinforcement Learning for Visual Object Tracking in Videos

نویسندگان

چکیده

منابع مشابه

Visual Tracking using Learning Histogram of Oriented Gradients by SVM on Mobile Robot

Tracking of Humans in Video Stream Using LSTM Recurrent Neural Network

Learning Visual Servoing with Deep Features and Fitted Q-Iteration

Long-Term Visual Object Tracking Benchmark

MBestStruck: M-Best diverse sampling for structured tracker

عنوان ژورنال:

اشتراک گذاری